A hybrid CNN-SVM model for enhanced autism diagnosis

Autism is a representative disorder of pervasive developmental disorder. It exerts influence upon an individual’s behavior and performance, potentially co-occurring with other mental illnesses. Consequently, an effective diagnostic approach proves to be invaluable in both therapeutic interventions and the timely provision of medical support. Currently, most scholars’ research primarily relies on neuroimaging techniques for auxiliary diagnosis and does not take into account the distinctive features of autism’s social impediments. In order to address this deficiency, this paper introduces a novel convolutional neural network-support vector machine model that integrates resting state functional magnetic resonance imaging data with the social responsiveness scale metrics for the diagnostic assessment of autism. We selected 821 subjects containing the social responsiveness scale measure from the publicly available Autism Brain Imaging Data Exchange dataset, including 379 subjects with autism spectrum disorder and 442 typical controls. After preprocessing of fMRI data, we compute the static and dynamic functional connectivity for each subject. Subsequently, convolutional neural networks and attention mechanisms are utilized to extracts their respective features. The extracted features, combined with the social responsiveness scale features, are then employed as novel inputs for the support vector machine to categorize autistic patients and typical controls. The proposed model identifies salient features within the static and dynamic functional connectivity, offering a possible biological foundation for clinical diagnosis. By incorporating the behavioral assessments, the model achieves a remarkable classification accuracy of 94.30%, providing a more reliable support for auxiliary diagnosis.

This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.

Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disease marked by language impairment, social impairment, and stereotyped behavior.According to statistical data, approximately one in every 100 children is diagnosed with autism [1].Despite its characteristics often being identifiable in early childhood, definitive diagnosis typically requires an extended period of time.Given that individuals with ASD generally need specialized medical care to mitigate the risk of co-occurring mental health conditions, timely and accurate diagnosis is imperative.
The intricate physiological activities within the human body orchestrated through the coordinated interplay among various brain regions, and the relationship between brain regions evolves over time.Leveraging these connections holds promise for diagnosis of autism and finding areas that causes brain disorders.Resting-state functional magnetic resonance imaging (rs-fMRI), with its advantages of high spatio-temporal resolution and absence of a requirement for elaborate tasks [2,3], has been extensively utilized for investigating a wide array of psychiatric conditions.
Constructing functional connectivity (FC) based rs-fMRI serves as an efficient method for characterizing relationships among brain areas.Early scholarly articles often focused solely on static functional connectivity, which presupposes that brain interrelations remain constant throughout the whole scanning process.However, recent literature reveals that neural connections fluctuate during the scan, thus giving rise to the broader application of dynamic functional connectivity in brain activity research [4][5][6].To October 6, 2023 1/11 Click here to access/download;Manuscript;plos_latex_manuscript.pdf furnish a more comprehensive evaluation of the brain activities, this paper employs both static and dynamic functional connectivity to analyze differences between the subjects with ASD and typical controls (TCs), aiming to identify areas causing ASD.
Nevertheless, only the analysis of brain activity may prove insufficient for the accurate diagnosis of ASD, necessitating the incorporation of additional information.
Current psychiatric screening or diagnostic procedures predominantly rely on behavioral observations rooted in symptomatology.For example, the Social Responsiveness Scale-Second Edition (SRS-2) serves as an early screening tool for ASD [7], quantifying social functionality in five aspects: awareness, cognition, communication, motivation, and mannerisms.Substantial evidence supports its effectiveness and sensitivity in identifying autism symptoms in school-age children. However

Materials and methods
The whole process of this experiment is illustrated in Figure 1.More details of Data preprocessing, staticCNN and dynamicCNN are shown in the following sections.

Data preprocessing
Firstly, we select resting-state fMRI data that includes SRS metrics from the publicly available Autism Brain Imaging Data Exchange (ABIDE) dataset [11].The distribution October 6, 2023 2/11 and age of the subjects selected are shown in the Table 1.Upon acquiring the data from the ABIDE dataset, we perform the following preprocessing steps: 1. Extract the evaluation scores of the five aspects of the SRS, convert them into a one-dimensional vector and subsequently sent it to the final SVM for classification, as depicted in the SRS feature in Figure 1.
2. Remove the first ten time points of each subject to mitigate errors caused by the instability of the gradient magnetic field at the beginning of the scan.Correct slice scan time to ensure that the resampled data can be seen as scanned at the same time point.Carry out head motion correction to mitigate the impact of head movements to some extent.Perform nuisance regression to eliminate the influence of extraneous factors.This study uses SPM12 and DPARSF [13] to preprocess fMRI data as mentioned above, and the extracted time series is used for subsequent processing.

static functional connectivity
The construction of a static functional connectivity matrix is predicated on the Pearson correlation coefficients between various regions of interest in the brain, as shown in Eq 1.
where X and Y are two distinct brain regions, while x i and y i denote the corresponding time series.

dynamic functional connectivity
Throughout the scanning process, the functional connectivity between different brain regions exhibits temporal fluctuations, indicating that a representation solely based on static functional connectivity would be overly simplistic.This has led to the introduction of dynamic functional connectivity, which also serves to enrich the data set.Sliding-window analysis is a method often considered for dynamic functional connectivity.It describes changes in the brain's connection patterns by using a fixed window size and moving it at a specific stride.However, there is no consensus on the optimal size for the fixed temporal window.Additionally, variations have been observed in connectivity matrices at different frequencies [14,15].These issues cannot be solved by sliding-window method.We therefore perform a time-frequency analysis of the signal.We choose to use wavelet analysis to characterize dynamic functional connections, which effectively extracts information from signals through operations such as scaling and translation, thus enabling a multi-scale, fine-grained examination.
Initially, we employ continuous wavelet transforms to process the time series for each brain area, as delineated by the Eq 2.
where s is the wavelet scale, τ denotes the translation value, x(t) is the signal, ϕ represents a mother wavelet and * denotes the complex conjugate [16].Scale s controls the expansion and contraction of the wavelet function, which is inversely proportional to frequency.The translation τ controls the translation of the wavelet function, which corresponds to time points.When we traverse the required scales and translations, we can obtain the needed spectrogram from the unstable signal for subsequent analysis.We employ the Morlet wavelet as the mother wavelet, which offers an optimal ratio between frequency bands and wavelet scales, facilitating the understanding of data within specific frequency ranges [17].In our analysis, we partition the frequency range of 0.01-0.08Hz into 40 segments, and the translation τ corresponds to the sampled time points.
After obtaining the time-frequency spectra for each region of interest, we proceed to compute their dynamic functional connectivity.Here, in order to describe the time-frequency spatial relationship between two signals, we initially introduce the concept of cross-wavelet power [16], which identifies regions of the common high power between the two signals, as shown in Eq 3.
where W x and W y are the continuous wavelet of x and y separately.To further describe the coherence of these two cross wavelets in time-frequency space as the functional connectivity between two brain areas, we introduce the concept of wavelet coherence [16], calculated as Eq 4.
where S(•) denotes a smoothing operator related to scale and time, for which we employ the most commonly used moving average technique.Obviously, the amount of data is too large.At the same time, the data obtained by different institutions have different time points n, and so they cannot be easily passed to the same neural network for learning.To this end, we consider employing Principal Component Analysis (PCA) on the matrices [18].In order to facilitate the unification of data from different institutions, we opt to perform dimensionality reduction on the temporal dimension, as detailed in the following procedures: 1. Calculate wavelet coherence for two regions of interest to obtain a matrix 2. For the matrix W C, remove the average value to get Ŵ C = [w 1 , ..., w n ], where Use the eigenvalue decomposition method to calculate its eigenvalues and the corresponding unit eigenvectors.Suppose there are l eigenvectors and unit eigenvalues, and λ 1 , λ 2 , ..., λ l are arranged from large to small.
4. In order to retain 99% of the information, it is necessary to find the smallest k that satisfies k i=1 λi l i=1 λi ≥ 0.99.These k unit eigenvectors can form an eigenvector matrix P.
5. Finally, the matrix is mapped to the space composed of the corresponding feature vectors, that is, XP ∈ R m×k .
After performing principal component analysis on all matrices W C ∈ R m×n , we find that k = 1 can satisfy the condition in step 4. Thus we finally obtain a matrix of 116 × 116 × 40 for each subject as the dynamic functional connectivity.

neural network
As illustrated in Figure 1, the input to our model is based on data from SRS tables, static FC and dynamic FC.Here, we employ CNNs to extract features from static FC and dynamic FC.For ordinary pictures, feature extraction is proficiently achieved through the utilization of 3 × 3 or 5 × 5 convolution kernels [19,20].However, when it comes to an individual FC matrix, each row or column corresponding to the correlation between one brain area and another brain area.The aforementioned convolution kernel cannot summarize such information, and there is no reasonable explanation for the local features.Therefore, we believe that a more reasonable convolution kernel should be 1 × n or n × 1, where n corresponds to the number of regions of interest, that is, 116.
The convolution kernel proposed here is more capable of extracting brain areas corresponding to functional connections that make greater contributions to classification.
For static FC, each subject corresponds to a 116 × 116 × 1 tensor, as shown in Fig 2 .We send it into the CNN.The first layer comprises 32 filters, and the corresponding convolution kernel is 1 × 116.After passing through the batch normalization (BN) layer and rectified linear unit (ReLu) layer, it is sent to the second convolution layer.The second layer comprises 64 filters, and the corresponding convolution kernel is 116 × 1.
After passing through the BN layer and ReLu layer, it is sent to the third convolution layer.The second layer comprises 8 filters, and the corresponding convolution kernel is 1 × 1.Finally, it goes through the BN layer and ReLu layer.The third layer of convolutional layer is to prevent the extracted feature dimension from being too large and affecting subsequent classification.In addition to this step, we also use a dropout layer to prevent overfitting, where the dropout rate is set to 50%.All activation functions employed are of the LeakyReLU type (α = 0.01).October 6, 2023 5/11 The most discriminative deep features need to be distinguished through the attention mechanism [21], which follows the dropout layer.This attention network is bifurcated into two fully connected layers: the first boasting 8 neurons, followed by a ReLU layer, and the second containing 16 neurons, which is the same number of the input neurons.Then, it passes through the Sigmoid function to obtain the normalized weight of each input feature.These normalized weights are utilized for weighted summation to extract the requisite static functional connectivity features.To facilitate backpropagation for parameter tuning, an additional layer is added, the size of which corresponds to the number of classification categories, which is 2. The loss between the actual labels and the predicted outcomes is minimized via the adaptive moment estimation optimizer , with a learning rate set at 0.0001.The employed loss function is cross-entropy.However, the focus is not on the network's predictive capacity.We just use it to train the network parameters and so the final hidden layer's parameters serve as the learned static functional connectivity features, as shown in the static feature in Fig 1.Subsequently, SRS features are concatenated with those gleaned from both static FC and dynamic FC.This composite feature set is then submitted to a SVM with a linear kernel for further classification.This experiment uses a 10-fold cross-validation of the data.The reason why this study uses SVMs instead of the multi-layer perceptron (MLP) for classification is because in experiments we find that the MLP model has poor generalization capability and performs poorly on the test set.In the support vector machine part, we use the more common linear kernel for classification, because the extracted feature dimension is small enough compared to the data set.In order to illustrate the advantages of SVM, we also conduct experiments using MLP and RF for classification.The architecture of the MLP consists of two hidden layers, featuring 64 and 16 neurons respectively.The number of neurons in the input layer is the dimension after concatenating three features, and the output layer is 2 neurons.As for random forest, 100 trees are selected for classification.These experiments adopt 10-fold cross-validation, and the results are averaged after several experiments.This figure compares the performance of this paper with previous papers.Wang [22]; Huang [23]; Yin [24]; Jiang [25]; Bhandage [26] .

Analysis
Previous research has revealed differences in the complex relationships between TCs and subjects with ASD, and that these differences in FC exist in multiple brain regions.To identify which brain regions may have undergone alterations, we leverage the convolutional layer weights in our selected model.This allows us to ascertain which interactions between brain areas have more pronounced impact on classification, thereby suggesting that these areas have important differences between autistic and normal individuals.Examining the network weight of the first layer of the static FC part and the network weight of the second layer of the dynamic FC part, we find that the corresponding weight is a matrix of 32 × 116.By taking the absolute value of the weight of the channel and summing it, we can obtain a matrix of 1 × 116.we then select the brain areas corresponding to the top 10 features with the largest absolute values as the most discriminative brain areas, as shown in Table 2 and Fig   As shown in Table 2, both static FC and dynamic FC in the heschl and superior frontal gyrus areas are considered to be one of the most effective areas for classification.We also find that both have a certain proportion in the cerebellum region [27,28].Previous classification models or physiological studies also have some similar conclusions that differences exist in these brain areas [29][30][31][32].At the same time, based on the weights coming from the first convolution layer of dynamic FC part, we can also get the smaller frequency bands in which there may be greater differences in the brain activity between subjects with ASD and TCs.After analyzing the weights, we find that the frequency bands between 0.04231Hz-0.04769Hzand 0.06385Hz-0.067436Hzcontribute October 6, 2023 7/11

Discussion
In pursuit of delineating the physiological disparities between individuals with ASD and TCs, we deploy a synergistic approach that melds both static FC and dynamic FC, defining specialized convolutional kernels for feature extraction from FC matrices.This method facilitates the identification of critical brain regions that are particularly contributory to distinguishing between the two groups-some of which have been corroborated by previous studies.Moreover, we discern two brain regions, the heschl and superior frontal gyrus, that contribute substantially to both static FC and dynamic FC for classifications.Additionally, we pinpoint narrower frequency bands where dynamic FC differences may exist.To further augment the diagnostic capabilities for autism, we incorporate SRS, combining them with FC features to auxiliary diagnose ASD effectively.In this way, our model achieves better classification results compared to some previous papers.
For future research, the integration of more diverse data sources for classification could be explored-for instance, by amalgamating various brain templates or utilizing time-frequency analysis methods to garner additional FC information.Another way is merging the text information obtained from communicating with patients and performing natural language processing to assist in screening.The introduction of these information not only advances the prospects of AI in medical diagnostics but also minimizes the likelihood of false positives or identifying those who pretend to be patients.In addition, the proposed model holds potential for aiding the diagnosis of other psychological conditions, such as depression.

Conclusion
In summary, this paper proposes a hybrid CNN-SVM network that combines autism early screening tools with resting-state fMRI data, achieving better classification performance.The model not only identifies brain areas that exert significant influence on classification outcomes but also elucidates the frequency bands that impact classification.These findings offer invaluable clues to the etiological mechanisms and the determination of biological markers for autism, and can also help diagnose patients with autism.
October 6, 2023 8/11 with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?Please add funding details. as follow-up to "Financial Disclosure Enter a financial disclosure statement that describes the sources of funding for the work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS ONE for specific examples.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.Funded studies Enter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?"This work was supported by the National Natural Science Foundation of China under Grant numbers 11671354.Please select the country of your main research funder (please select carefully as in some cases this is used in fee calculation).CHINA -CN Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation as follow-up to "Financial Disclosure Enter a financial disclosure statement that describes the sources of funding for the work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS ONE for specific examples.
Funded studies Enter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?"Competing Interests Use the instructions below to enter a competing interest statement for this submission.On behalf of all authors, disclose any competing interests that could be perceived to bias this work-acknowledging all financial support and any other relevant financial or nonfinancial competing interests.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate and that any funding sources listed in your Funding Information later in the submission form are also declared in your Financial Disclosure statement.

Fig 1 .
Fig 1.The entire process of this experiment.Details of the staticCNN and dynamicCNN part are illustrated in Figure 2 and Figure 3 separately.

3 .
Register the structural image data and then map it to the standard space-MNI space.This aligns the images of each subject to ensure that the anatomical structures of different subjects correspond to the same voxels.Perform spatial smoothing to enhance registration effects.4. Employ band-pass filtering (0.01-0.08 Hz).Most nuclear magnetic resonance signals are low-frequency signals.Given that magnetic resonance signals are primarily low-frequency signals, this frequency range is chosen to minimize the impact of physiological noise from cardiac (∼ 0.15) and respiratory (∼0.3 Hz) activities [12]. 5. Utilize the Automated Anatomical Labeling (AAL) atlas, which partitions the brain into 116 regions.These regions serve as our regions of interest (ROIs), and the corresponding time series are extracted.
x i and y i are mean values of x i and y i separately.The underlying assumption of static functional connectivity is that the interconnections between different brain areas remain invariant throughout the entire scanning procedure, and the Pearson correlation coefficient characterizes this relationship[5].The resulting static October 6, 2023 3/11 functional connectivity matrix has dimensions of 116 × 116, as illustrated in the input part of Figure 2.

Finally, after processing
different times and scales, we convert the signals of any two regions of interest into a matrix W C ∈ R m×n , here m corresponds to the number of October 6, 2023 4/11 chosen scales, specifically m = 40,and n corresponds to the number of time points.In this way, for the same individual, we can get 6612 completely different matrices.

Fig 2 .
Fig 2. The entire process of processing static functional connectivity.

Fig 3 .
Fig 3.The entire process of processing dynamic functional connectivity.

Results 0 . 5
Classification comparison According to the results of cross-validation, sensitivity (SEN), specificity (SPE) and accuracy (ACC) are used as evaluation indicators for classification results.Here, true October 6, 2023 6/11 positives (TPs) are considered to be correctly diagnosed ASD patients, true negatives (TNs) are considered to be truly diagnosed TCs, false positives (FPs) and false negatives (FNs) are respectively incorrectly diagnosed ASD patients and TCs.SEN, SPE and ACC are calculated as follows, ACC = T P + T N T P + F P + T N + F the model's ability to predict correctly, while SEN and SPE tell the model's ability to identify subjects and negative examples respectively.After several ten-fold cross-validation, our framework achieves 94.30(±0.73)%ACC, 95.14(±1.59)%SEN and 94.15(±2.24)%SPE.Fig 4 illustrates the performance of three different classifiers.

Fig 4 .
Fig 4. The performance of different classifiers.

Fig 6 .
Fig 6.This figure shows the most discriminating brain areas.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems CorporationPowered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation [8][9][10]tionnaire mainly focuses on social communication, and only a few items involve stereotyped behavior.Researchers are still gathering more data on the stability and effectiveness of the SRS-2[8][9][10].While other diagnostic tools such as the Autism Diagnostic Observation Schedule, the Childhood Autism Rating Scale, and the Autism Diagnostic Interview-Revised are commonly employed in clinical settings, these methods are time-consuming.To promote more accurate diagnosis, it is not enough to only use the scale results to assist diagnosis.Consequently, we have integrated analyses of functional connectivity to better assist diagnosis.
Machine learning is widely used for diagnostic assistance, including traditional machine learning, such as support vector machines (SVM), random forests (RF), and deep learning models, such as convolutional neural networks (CNN).While traditional machine learning algorithms demonstrate commendable performance in classification tasks, their accuracy still needs to be improved, and deep features cannot be extracted.On the other hand, deep learning models can acquire deep features and offer improvements in classification accuracy, but these models still transfer the processing of general images when processing functional connectivity, without adapting to the unique characteristics of functional connectivity.To harness the full potential of early screening questionnaires and brain functional connectivity for autism diagnosis, this paper introduces a hybrid CNN-SVM model.This model employs a CNN architecture to extract deep features from both static and dynamic functional connectivity.In the learning process, the feature extraction based on the frequency band and the convolution kernel based on the functional connectivity matrix are used, and the attention mechanism is introduced to weight the learned features.Finally, the learned features combined with the features from the SRS are sent to SVM for classification.After studying 379 subjects with ASD and 442 typical controls, our model achieves good classification accuracy and provides the most discriminating brain regions and scanning bands.

Table 1 .
Distribution of the data from rs-fMRI ABIDE database used in this study.

Table 2 .
The discriminating brain areas of static FC and dynamic FC.Perhaps there are more differences in brain activity between subjects with ASD and TCs in these frequency bands, which requires further physiological experimental confirmation.